Distance and Similarity Measures Effect on the Performance of K-Nearest Neighbor Classifier - A Review

نویسندگان

  • V. B. Surya Prasath
  • Haneen Arafat Abu Alfeilat
  • Omar Lasassmeh
  • Ahmad B. A. Hassanat
چکیده

The K-nearest neighbor (KNN) classifier is one of the simplest and most common classifiers, yet its performance competes with the most complex classifiers in the literature. The core of this classifier depends mainly on measuring the distance or similarity between the tested example and the training examples. This raises a major question about which distance measures to be used for the KNN classifier among a large number of distance and similarity measures? This review attempts to answer the previous question through evaluating the performance (measured by accuracy, precision and recall) of the KNN using a large number of distance measures, tested on a number of real world datasets, with and without adding different levels of noise. The experimental results show that the performance of KNN classifier depends significantly on the distance used, the results showed large gaps between the performances of different distances. We found that a recently proposed non-convex distance performed the best when applied on most datasets comparing to the other tested distances. In addition, the performance of the KNN degraded only about 20% while the noise level reaches 90%, this is true for all the distances used. This means that the KNN classifier using any of the top 10 distances tolerate noise to a certain degree. Moreover, the results show that some distances are less affected by the added noise comparing to other distances.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparing pixel-based and object-based algorithms for classifying land use of arid basins (Case study: Mokhtaran Basin, Iran)

In this research, two techniques of pixel-based and object-based image analysis were investigated and compared for providing land use map in arid basin of Mokhtaran, Birjand. Using Landsat satellite imagery in 2015, the classification of land use was performed with three object-based algorithms of supervised fuzzy-maximum likelihood, maximum likelihood, and K-nearest neighbor. Nine combinations...

متن کامل

[Proceeding] A multi-measure nearest neighbor algorithm for time series classification

In this paper, we have evaluated some techniques for the time series classification problem. Many distance measures have been proposed as an alternative to the Euclidean Distance in the Nearest Neighbor Classifier. To verify the assumption that the combination of various similarity measures may produce a more accurate classifier, we have proposed an algorithm to combine several measures based o...

متن کامل

A Multi-measure Nearest Neighbor Algorithm for Time Series Classification

In this paper, we have evaluated some techniques for the time series classification problem. Many distance measures have been proposed as an alternative to the Euclidean Distance in the Nearest Neighbor Classifier. To verify the assumption that the combination of various similarity measures may produce a more accurate classifier, we have proposed an algorithm to combine several measures based o...

متن کامل

Comparison of Distance Metrics for Phoneme Classification based on Deep Neural Network Features and Weighted k-NN Classifier

K-nearest neighbor (k-NN) classification is a powerful and simple method for classification. k-NN classifiers approximate a Bayesian classifier for a large number of data samples. The accuracy of k-NN classifier relies on the distance metric used for calculating nearest neighbor and features used for instances in training and testing data. In this paper we use deep neural networks (DNNs) as a f...

متن کامل

How k-Nearest Neighbor Parameters Affect its Performance

The k-Nearest Neighbor is one of the simplest Machine Learning algorithms. Besides its simplicity, k-Nearest Neighbor is a widely used technique, being successfully applied in a large number of domains. In k-Nearest Neighbor, a database is searched for the most similar elements to a given query element, with similarity defined by a distance function. In this work, we are most interested in the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1708.04321  شماره 

صفحات  -

تاریخ انتشار 2017